Sparse Overcomplete Latent Variable Decomposition of Counts Data
نویسندگان
چکیده
An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
منابع مشابه
Learning Overcomplete Latent Variable Models through Tensor Methods
We provide guarantees for learning latent variable models emphasizing on the overcomplete regime, where the dimensionality of the latent space exceeds the observed dimensionality. In particular, we consider multiview mixtures, ICA, and sparse coding models. Our main tool is a new algorithm for tensor decomposition that works in the overcomplete regime. In the semi-supervised setting, we exploit...
متن کاملSample Complexity Analysis for Learning Overcomplete Latent Variable Models through Tensor Methods
We provide guarantees for learning latent variable models emphasizing on the overcomplete regime, where the dimensionality of the latent space can exceed the observed dimensionality. In particular, we consider multiview mixtures, spherical Gaussian mixtures, ICA, and sparse coding models. We provide tight concentration bounds for empirical moments through novel covering arguments. We analyze pa...
متن کاملA Mixture Model for Learning Sparse Representations
In a latent variable model, an overcomplete representation is one in which the number of latent variables is at least as large as the dimension of the data observations. Overcomplete representations have been advocated due to robustness in the presence of noise, the ability to be sparse, and an inherent flexibility in modeling the structure of data [9]. In this report, we modify factor analysis...
متن کاملAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension. Finding the CP decomposition of an overcomplete tensor is NP-hard in general. We consider the case where the tensor components are randomly drawn, and show that the simple power iteration recovers the components with bounded error under m...
متن کاملWhen are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabular...
متن کامل